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Q I Abstract 

Q ' This paper studies problems of inferring order given noisy information. In these 

problems there is an unknown order (permutation) vr on n elements denoted by 1, . . . , n. 
We assume that information is generated in a way correlated with vr. The goal is to 
^ ' find a maximum likelihood vr* given the information observed. We will consider two 

different types of observations: noisy comparisons and noisy orders. 

• Noisy Orders (also called the Mallow's model). Given the original permutation vr, 
the probability of a permutation a being generated is proportional to q-0<^k{'^,t^) _ 
In other words, the probability is inverse exponential in the Kemeny distance of 

Q>^ ' TT from o", which is the number of pairs ordered in vr differently from a: 

o : 

dKiTr,cr) = #{{i,3) : 7r(i) < 7r(j) and a{i) > a{j)}. 

^ I We assume that we are given cji , . . . , cr^ that are generated independently condi- 

■ tioned on vr. 

• Noisy Comparisons. The input is the status of (2) queries of the form q{i,j), for 
i < j, where q{i,j) = +(— ) with probability 1/2 + A if 7r(i) > 7r(j)(7r(i) < vr(j)) 
for all pairs i ^ j, where A > is a constant. It is assumed that the errors are 
independent. More generally, the input may be any collection of independent 
biased signals on the order relationship between pairs of elements. 
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In this paper we present polynomial time algorithms for solving both problems with 
high probability. For noisy orders the running time of the algorithm is n^'^'-^^^/^^^ \ 
and for noisy comparisons the algorithm runs in time n'^^^ ^\ Both algorithms have 
0(n log n) query complexity (with the constant depending on A,/? and r). 

As part of our proof we show that for both models the maximum likelihood solution 
vr* is close to the original permutation vr. More formally, with high probability it holds 
that 

y |7r(i) — 7r*(i)| = 0(n), max |7r(i) — 7r*(z)| = 0(logn). 

i 

Our results are of interest in applications to ranking, such as ranking in sports, or 
ranking of search items based on comparisons by experts. 



1 Introduction 

We study the problem of sorting in the presence of noise. While sorting linear orders is a 
classical well studied problem, the introduction of noise creates very interesting challenges. 
Noise has to be considered when ranking or sorting is applied in many real life scenarios. 

A natural example comes from sports. How do we rank a league of soccer teams based 
on the outcomes of the games? It is natural to assume that there is a true underlying order 
of which team is better and that the game outcomes represent noisy versions of the pairwise 
comparisons between teams. Note that in this problem it is impossible to "re-sample" the 
order between a pair of teams. As a second example, consider experts ranking various items 
according to their importance. It is natural to assume that the experts' opinions represent 
a noisy view of the actual order of significance. The question is then how to aggregate this 
information? 



1.1 Aggregating rankings: Mallow's Model 

The classical model for noisy permutations was introduced by Mallow [Mal57j . This model 
is parameterized by a permutation tt* and a real parameter /? > 0. The probability of 
observing a permutation vr is exponentially small in (3 times the distance between vr and vr*. 
More formally, given the original permutation vr*, the probability of a permutation vr being 
generated is inverse exponential in the Kemeny distance of vr from vr*. The Kemeny distance 
is the number of pairs ordered in tt differently from tt*: 

d^(7r,7r*) = : 7r*(z) < 7r*(j) and 7r(z) > tx{j)]. (1) 

Definition 1. In Mallow's model, the probability of a permutation vr is given by 

P[n\n*] = ^e-^-^-^--'). (2) 
for a (3 > and a normalization constant Z{j3). 
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This model has been studied extensively in statistics and has been generalized in a number 
of ways, see e.g. |Dia88l [FV861 IFV88j . 

Our goal is to find the best fit for the permutation tt* given r independent observations 
TTi, . . . ,iTr that are distributed according to ([2]) . 

Definition 2. The Mallow Reconstruction Problem (MRP) is the problem of finding a ir* 
maximizing the quantity 



nn[vrA:|vr* 



z{(3y 



k=l 

or equivalently minimizing 

r 

d{n*) ■.= Y,dK{'i^kX). (3) 

k=l 

The optimization problem without any assumptions on the generating process is NP- 
hard |BTT89] . On the other hand, a number of heuristics were suggested in the statistical 
literature for solving the problem |FV90t ICSS991 IMPPB07] . None of these heuristics have 
a guarantee to find the correct permutation even assuming the permutations are generated 
from the model. 

In one of our main results we will show that the MRP problem can be solved in polynomial 
time, that approaches linear time as r increases. 

1.2 Aggregating noisy comparisons 

We next define a second model for noisy sorting. In this model the noise is applied to each 
pairwise comparison. In other words, for each pair, the correct order is observed with some 
probability greater than 1/2. 

1.2.1 The sorting model: Noisy Signal Aggregation 

We will consider the following probabilistic model of instances. There will be n items denoted 
1, . . . , n. There will be a true order given by a permutation tt on 1, . . . , n. For two elements 
2, j G [n] we write i <^ j if 7r(i) < 7r(j). 

The algorithm will have access to (2) signals defined as follows. 

For each unordered pair {a, 6}, it receives a signal Sa,b = Sb,a- The signal distribution V 
depends on whether a < b oi b < a: 

^ _ / Va<b if vr(a) < 7i{b), . . 

\ P,<„ if7r(6)<7r(a). 

We assume that the signals are independent conditioned on the true order. In other words, 
for any set 5* = {(ii, ji), (^2,^2), • • • , {ik,jk)} of unordered pairs, such that (a, b) ^ S, and a 
vector of signals s = {si^j^, Si^j^, . . . , Sj^jJ, 

[Sa,b = ■ \ 7r,s] = V [Sa,b = " | l7r(a)<7r(6)] • 
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The goal of Noisy Signal Aggregation (NSA) problem defined below is to find a permutation 
vr that is most consistent with the signals. 

Definition 3. Given the signals Sij for all pairs {i,j} G [n], the Noisy Signal Aggregation is 
the maximum likelihood permutation ir, assuming uniform prior. In other words, n maximizes 
the quantity 

Given a signal Sa,b, assuming uniform prior we have 

P[a <^ b I Sa,b] 1^a<biSa,b) 



(6) 



F[b <^ a I Sa,b] 'Db<a{Sa,b)' 

We associate a score q{a < h) with the decision to rank a below h as the log of this ratio: 

/ ^ 7 \ 1 T^a<b\Sa,b) 

q[a < h) ■= log -. (7) 

L^b<a\Sa,b) 

Obviously, q{h < a) = —q{a < h). Note that by Gibbs' inequality E[g(a < 6)|7r(a) < 7r(6)] > 
0. The NSA problem thus can be rephrased as the following problem. 

Proposition 4. The NSA Problem is equivalent to the problem of finding a o that maximizes 
the total score 

s,{a):=Y,q{i<3). (8) 

We will discuss several NSA models. The simplest one is defined as follows. 

Definition 5. The Simple Noisy Sorting Aggregation problems with parameter X is 
a NSA problem where Sa,b £ {+? — } for all a, b and 

Va>b{ + ) = ^ + ^> ^a>bH = 1-^^ (9) 
'Da<b{ + ) = ^ - ^' 1^a<b{-) = 1 + ^- 

Our results showing that the SNSA can be solved efficiently are presented in section 11.31 
The results are also extended to a much more general family of NSA problems. 



1.2.2 Related Sorting Models and Results 

It is natural to consider the problem of finding a ranking a that minimizes the score Sq{a) 
where the input q takes only the values of ±1 (a relation between every pair), and there 
are no probabilistic assumptions on the input. This problem, called the feedback arc set 
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problem for tournaments is known to be NP hard [ACNOSt IAI0O6] . However, it does admit 
PTAS |KMS07j achieving a (1 + e) approximation for 



in time that is polynomial in n and doubly exponential in 1/e. The results of |KMS07] 
are the latest in a long line of work starting in the 1960's and including [ACN05t lAloOGj . 
See |KMS07] for a detailed history of the feedback arc set problem. 

A problem that is in a sense easier than NSA is the problem where repetitions are al- 
lowed in querying. In this case it is easy to observe that the original order may be recovered 
in 0(nlog^7T,) queries with high probability. Indeed, one may perform any of the standard 
0(?7.1og?7,) sorting algorithms and repeat each query 0(log?T,) times in order to obtain the 
actual order between the queried elements with error probability (say). More sophisti- 
cated methods show that in fact the true order may be found in query complexity O(nlogn) 
with high probability [FPRUQnj . see also |KKn7j . 

Remark 6. Some of our results on the SNSA problem appeared as an extended abstract 
m WM08f . 

1.3 Main Results 

1.3.1 Mallow Reconstruction Problem 

For the Mallow Reconstruction Problem our main result is that the problem can be solved 
in time that tends to linear as r increases beyond Formally, we prove the following: 

Theorem 7. There exists a randomized algorithm such that z/ tti, . . . , tt^ be rankings on n 
elements independently generated by Mallow's model with parameter P > 0, and let a > 0. 
Then a maximum probability order tt™ can is computed in time 



and error probability < n In particular, the algorithm tends to almost linear as r grows. 
1.3.2 Simple Noisy Signal Aggregation 

For the Simple Noisy Signal Aggregation problem, our main result is the following. 

Theorem 8. For any A > and a > there exists a randomized algorithm that except with 
probability at most ri"" finds an optimal solution to the Simple Noisy Signal Aggregation 
( SNSA ) with parameter A in time rpii^+'^)^ ) . 
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1.3.3 General Noisy Aggregation 

Our results extend to more general models of NSA aggregations which we now discuss. In 
order for our aggregative reconstruction to work, we will need two properties from the signal 
distributions. 

Definition 9. We say that a collection of distributions 'Da<biT^b<a is strongly 7-biased if 
(a) For every < m < n, and for any m different 'Dai.^bk such that <t^ for at least 



2/3 of the k's: 



q{ak < bk) > 



k=l 



> 1 



-■ym 



(10) 



(b) There is a constant A such that for any A different 'Da^.bk such that at < bk holds for all 
the k 's, 

A 

^g(afc<6fe)>0 >l-10-l (11) 

.k=l 

Under these conditions we prove the following. 

Theorem 10. For any 7 > and a > there exists a randomized algorithm that except 
with probability at most n~°' finds an optimal solution to the Noisy Signal Aggregation (NSA) 
problem on strongly ^-biased signals in time n*^((°+^)''' 

In the statement above and throughout the paper O(-) signifies order of magnitude up to 
logarithmic corrections in the variables in the expression inside the O(-). A key ingredient 
in the proof of Theorem [10] is the following. 

Theorem 11. Consider the NSA problem on strongly 'j -biased signals and let vr be the true 
order and a be any optimal order. Let a > 0. Then there exist constants Ci(q!, 7) and 02(0:, 7) 
such that except with probability 0{n^°') the following inequalities hold: 



1=1 

max \ a(i) 



TT u 



< C2 log n. 



(12) 



(13) 



Extending the techniques of |FPRU90] it is possible to obtain the results of Theorem [TO] 
with low sampling complexity. More formally, 

Theorem 12. There is an implementation of a sorting algorithm with the same guarantees 
as in TheoremlT^ and whose sampling complexity is Cnlogn where C = C{a,'y, A) . 
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In fact, Theorems [TO] and [TT] only require condition Q from Definition [9l Condition 
(jb]) is only used to establish low sampling complexity in Theorem [T^l We note that with- 
out condition (jb]) Theorem fTH holds with sampling complexity of O(nlog^n) rather than 

0(?7,log?7,). 

We briefly note that from Azuma inequality it follows that 

Claim 13. SNSA distributions ^ with parameter X are strongly 7 biased with 7 = Q{X). 

Therefore Theorem [8] follows from Theorems [TO] and [12] More generally we have the 
following claim that gives a large set of strongly 7-biased distributions: 

Claim 14. Consider the NSA problem where there exists a constant C such that for all a, b 
the functions q{a, b) and q{b, a) are bounded by C and 

E[g(a < 6)|7r(a) < 7r(6)] > A and E[g(& < a)|7r(6) < 7r(a)] > A. 

Then the distributions Da<:b, Da>b ore strongly-"^ biased 7 = VL{\/C). 

1.4 Techniques 

1.4.1 Mallow Reconstruction Problem 

In the Mallow Reconstruction Problem we need to aggregate r noisy orderings tti, . . . , vr^ into 
one optimal ordering vr™. It seems intuitively natural to try to "average" these orderings into 
one ordering tt. It turns out that this intuition is correct, and in fact just taking the average 
of the locations of element x under the vTj's locates it within a distance of O(-^logn) from 
its location in the true order vr* with high probability. Note that this distance decreases as 
r is increased. 

Somewhat surprisingly, the bulk of the works goes into showing that the optimal ordering 
TT*" is pointwise close to the true ordering vr*. This is important since we want to show that 
the "average" tt is close to tt"^, but can only show that it is close to tt*. 

Our algorithm uses the "average" order vr as a starting point for a dynamic programming 
algorithm from Section [2] that finds the optimum vr™. The results of this section may be 
of independent interest in cases where we are looking for an optimum order and have a 
pointwise good initial guess for it. 

1.4.2 Noisy Signal Aggregation 

In order to obtain a polynomial time algorithm for the NSA problem it is important to 
identify that any optimal solution to the problem is close to the true one. Thus the main 
step of the analysis is the proof of Theorem [TT] 

To perform the sorting efficiently we use an insertion algorithm. Given an optimal order 
on a subset of the items we show how to insert a new element. Since the optimal order both 
before and after the insertion of the element has to satisfy Theorem [TT] it is also the case 
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that no element moves more than O(logn) after the insertion and re-sorting. Using this we 
perform a "re-sorting" using the dynamic programming algorithm in Section [2J 

The main task is proving Theorem [TT] in Section 14.11 We first prove (1121) by showing 
that for a large enough constant c, it is unlikely that any order a whose total distance from 
the original order tt is more than cn will have Sq{a) > Sq{7r). We then establish f|T3l) in 
Section 14.1.21 using a bootstrap argument. The argument is based on the idea that if the 
discrepancy in the position of an element a in an optimal order compared to the original 
order is more than clogra for a large constant c, then there must exist many elements that 
are "close" to a that have also moved by much. This then leads to a contradiction with (fT2l) 
applied to the neighborhood of a. 

The final analysis of the insertion algorithm and the proof of Theorem [TO] are provided in 
Section 14.21 Section 14.31 shows how using a variant of the sorting algorithm it is possible to 
achieve polynomial running time in sampling complexity O^(nlogn) thus proving Theorem 

m 

It is natural to ask whether the algorithm proposed here is applicable in the more general 
feedback arc set problem and whether other efficient algorithms for the more general problem 
are applicable here. It is easy to see that "sorting by number of wins" algorithm, whose 
approximation ratio has been recently studied [CFROG] , will result with high probability 
with an order a' with Sq{a') > r?l'^~'' + ^^(vr) for any e > even for a simple Bernoulli q. A 
similar statement holds for a greedy algorithm where elements are inserted optimally one at 
a time. With more work it is possible to show that the algorithm presented here does not 
provide a PTAS for the feedback arc set problem on tournaments and that the complicated 
algorithm of |KMS07j does not solve the problem presented here. 

1.4.3 Comparing the Two Sorting Problems 

It is interesting to compare the two sorting problems studied here. The two generative 
models seem to be very closely related. In fact it is easy to see that if one looks at the 
random tournament defined by the noisy comparisons model and conditions on it being 
a permutation, then one recovers the Mallow model. However, the conditioning on the 
tournament is a very strong conditioning as we condition on an event whose probability is 
2-f^(n )^ This conditioning also has very strong consequences: for example - with constant 
probability the minimal element in the original vr* will also be the minimal element in 
the generated order vr. Such a property does not hold for the noisy comparisons model 
as it is easy to see that the probability that the minimal element in tt* will satisfy the 
maximal number of less equal relations in the noisy input is ?7,~^/^+°^^^ In fact, as we will see 
below, in the noisy order model each generated permutation vr satisfies with high probability 
that max|7r*(i) — 7r(i)| = 0(logr;,) so in a sense each permutation is already close to the 
original permutation. For the noisy comparisons problem it is much harder to construct 
any permutation vr satisfying the condition above - and this is one of the main algorithmic 
challenges we need to overcome. 
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1.5 Distances between rankings 



Here we define a measure of distance between rankings tliat will be used later, and introduce 
some notation. First, given two permutations cr and r we define the dislocation distance by 

n 
i=l 

Recall that the Kemeny distance dxicr, t) is the number of pairs on which a and r 
disagree. We will write d{a) for d{a,id) where id is the identity permutation and for 
dxic, id). In this paper we will often use the following well known claim jDG77j relating the 
two distances. 

Claim 15. For any t, 

^d{T) < dK{T) < d{T). 

1.6 Acknowledgment 

E. M. thanks Andrew Tomkins for inspirational discussions and Marina Meila for interesting 
discussions on Mallow's model. 

2 Sorting an almost sorted list 

In this section we present an algorithm that given a pre-sorted list so that each element is at 
most k positions away from its location in some optimal ordering, finds an optimal ordering 
in time 0{n ■ ■ 2^^). The algorithm will be used as a building block for other algorithms 
in the paper. 

Lemma 16. Let [n] he n elements together with a scoring function q. Suppose that we are 
given that there is an optimal ordering cr(l), o"(2), . . . , a{n), that maximizes the score 

o"(«)<o"(i) 

such that |cr(i) —i\<k for all i. Then we can find such an optimal a in time 0[n ■ k'^ ■ 2^^). 

In the applications below k will be O(logn). When k is small (o(logn)), the algorithm 
tends to linear. Note that a brute force search over all possible a would require time k^^'^\ 
Instead we use dynamic programming to reduce the running time. 

Proof. We use a dynamic programming technique to find an optimal sorting. Let i < j he 
any indices, then by the assumption, the elements in the optimally ordered interval 

J= {(t(2),(t(2+1),...,(t(j)} 
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satisfy /~ C / C J"*" where 

= [i — k,j + k], and I~ = [i + k,j — k]. 

Hence selecting the set 5/ = cr{i + 1), . . . , cr{j)} involves choosing a set of size j — i + 1 

that contains the elements of /~ and is contained in /"''. This involves selecting 2k elements 
from the list (or from a subset of the list) 

[i-k,...,i + k-l]U[j -k + l,...,j + k] 

which has Ak elements. Thus the number of such S'/'s is bounded by 2^'^. 

We may assume without loss of generality that n is an exact power of 2. Denote by Jo 
the interval containing all the elements. Denote by Ji the left half of Iq and by I2 its right 
half. Denote by I3 the left half of /i and so on. In total, we will have n — 1 intervals of 
lengths 2,4,8, . . .. 

For each It = let St denote the possible (< 2'^^) sets of the elements = 

[cr(i), . . . , cr{j)]. We use dynamic programming to store an optimal ordering a' of each such 
It G St- The total number of I^s we will have to consider is bounded by n ■ 2^''. In addition, 
for each processed interval I^ we store its optimal score s'{I^,a'), such that 

a'{i')<a'{j'), i'<j'<i'+2k 

In other words, we only sum over pairs in I[ that are less than 2k apart, and which 
are the only pairs that potentially may get swapped. Note that the actual score s(/^, a') is 
shifted from a') by an amount that is independent of a': 

silver') = Yl ^(^'</)= E Qi^'<f) + 

cr'(j')<o-'(j') cr'(j')<o-'(i'). i'<j'<i'+2k 

E </) = /(/>') + E '?(^'</)- 

a'{i')<cr'{j'), j'>i'+2k j'>i'+2k 

Hence maximizing s'{I^,a') is equivalent to maximizing the actual score s{It,a'). 

We proceed from t = n — 1 down to t = producing and storing an optimal sort for each 
possible It- For t = n — l,n — 2,...,n/2 the length of each I[ is 2, and the optimal sort can 
be found in 0(1) steps. 

Now let t < n/2. We are trying to find an optimal sort of a given I^ = + 2s ~ 1]. 
We do this by dividing the optimal sort into two halves Ii and Ir and trying to sort them 
separately. We know that J; must contain all the elements in I^ that come from the interval 
[1, . . . ,i + s — 1 — k] and must be contained in the interval [1, . . . ,i + s — 1 + k]. Thus there 
are at most 2^^^ choices for the elements of and the choice of Ii determines Ir uniquely. 
For each such choice we look up an optimum solution for Ii and for in the dynamic 
programming table. Among all possible choices of // we pick the best one. This is done by 
recomputing the score s' for the joined interval, and takes at most 0{k'^) time, since the only 
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new pairs with \i' — j'\ < 2k are along the boundary between // and J^. Thus the total 

cost will be 



T^intervals of length 2* ■ ^checks ■ cost of check = 



i=l 

log n 



□ 



3 Noisy ordering aggregation 

We will now turn our attention to aggregating noisy rankings generated by Mallow's model. 
Recall that in this model, the probability of a permutation tt given a true ordering vr* is 
given by 

P[^k1 = (14) 

where tt*) is the Kemeny distance - the number of pairs which tt and tt* order differ- 

ently. As a first step we show that under this model, locations of individual elements are 
distributed geometrically. 

Lemma 17. Let a he an element that is ranked k-th by tt*. In other words, 7r*(a) = k. Then 

P[|7r(a)-A;| > i] < 2 ■ e-^V(l - e"'^). 

for all i. 

Proof. For simplicity, we assume that tt* is the identity map: vr*(i) = i. The key observation 
in the proof is that for any m, the distribution of the locations of m + 1, . . . ,n under tt 
remains the same if we condition on the ordering of {1, . . . ym} between themselves under 
vr. Thus vr can be sampled by inserting the elements 1, . . . ,n into the ordering one-by-one, 
each time conditioning on the order so far. 

Suppose we sampled the relative ordering of 1, . . . , — 1 under vr, and would like to insert 
a new element k. By f fT^ . the probability of k being mapped to location k — i is bounded 
by e~^\ Note that after further insertions, the location of k may only increase. Hence 

oo 

P[7i{k) <k-t]<J2 e"'^' = e-^'/il - e""). (15) 

A symmetric argument gives the same bound for P[vr(A;) > /c+z], and completes the proof. □ 

Next, we assume that we are given r independent samples generated by Mallow's model. 
In each one of them, the location of k is geometrically distributed around k. This allows us 
to prove a stronger concentration for the average of these locations. Again, for simplicity we 
assume that vr* is the identity vr*(i) = i. 
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Lemma 18. Suppose that the permutation tti, . . . , vr^ are drawn according to (fT^ . Let a = k 
be the element ranked k-th by vr*. Let 7r(a) be the average index of a under the permutations 
TTi, . . . , vr^.- 

I 



. vr,;(a). 

r . 
1=1 



Then 

(5i + 1) ■ e-^' 



P[|7r(a) -k\>i]<2 
for all i. 



Proof. For a vector 6 = {hi, . . . ,br) of non-negative integers let Af, denote the event that 
7rj(a) < k — bj for j = 1, . . . , r for which bj > 0. By f[T^ we have 



Next, we note that the event [7r(a) < A; — z] is covered by 

U iM- 

Hence 

-/3ri 



P[7r{a) < < #<j6: ^6 



e 



ri + r-l\ e-^" ^ (5z + 1)'' ■ e"^" 



r-1 y (l-e-/3)^ (l-e-/5)r 



Taking the symmetric bound for P[7r(a) > k + i] completes the proof. □ 

In particular, assuming r is fixed, the following statement holds. 
Claim 19. Let a > 0. Then for sufficiently large n, 



a + 2 



Wik) — k\> logra for some k 

p ■ r 



< n 



Proof. The claim follows immediately from Lemma [T8l □ 

We see that the margin of error for each element decreases proportionally to r. We will 
now use Lemma [16] from Section [2] to give an efficient algorithm that finds the maximum 
likelihood permutation vr™ given tti, . . . , tTj.. Recall that such a vr™ minimizes 

r r 

J2M^k,7r"') = Yl Yl l-fe«>-fe(i)= Yl if{k:M^)>M3)}- (16) 

fc=l k=l 7r™(i)<7r'"(j) 7r'"(i)<7r'"(j) 
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Set q{i < j) := : vrfc(i) < iTk{j)}. Then minimizing (fT6|) is equivalent to maximizing 



7r'"(i)<7r'"(j) 



Let TT be the elements {k} sorted according to their 7r{k) value. By Claim [19] it follows 
that except with probability n~°, 

\W(k) - TT*(k) \ < 2 ■ — ^logn for all k. (17) 
p ■ r 

In order to apply Lemma [16] to obtain the optimum vr™ from the approximation W it 
remains to see that with high probability the optimum vr*" is pointwise close to the original 
71* (and hence, by ([T7|) . to vf). For simplicity, we assume that vr* is the identity order 1, . . . , n. 

Denote 

r fa « + « a + 2 + l//5^ 

L = max o ■ — log n, o ■ 



We first use ( !T5|) to prove the following simple claim. 

Claim 20. Except with probability we have that for any i, j such that i < j — L, 

<])> 

In other words, less than 1/3 of the permutations tti, . . . , vr^ order i and j incorrectly. 
Proof. By a direct application of f[T5]) . for each /c, 

P[vrfc(j) < 7rfc(2)] < P[7rfe(j) < j - L/2] + P[7rfc(2) > 2 + L/2] < 

2 . e-^^/V(l - e-^) < r^-3("+l)/^ 

for a sufficiently large n. In the case when r < logra, the probability of having at least r/3 
rearranged pairs is bounded by n~^"+^'' • 2'' < n~°'. In the case when r > logra, we have 

P[vrfc(j) < vr,(0] < 

and the probability of having at least r/3 rearranged pairs is bounded by 

□ 

We are now ready to prove the lemma on the proximity of the optimum to the original. 
Lemma 21. Except with probability < 2 ■ for any optimal vr™ and for all k, we have 

|7r'"(A;) -7r*(A;)| < 32L, 
where n* is the original permutation. 
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Proof. We will assume that the sampled permutations vti, . . . , vr^ satisfy the property in Claim 
[201 which happens except with probability of at most Suppose, for contradiction, that 
there is a k such that {^"^{k) — k\ = M > 32L. Without loss of generality suppose that 
7r'^{k) = k + M. 

We first claim that there must be at least T > M/ 4 — L > 7L indexes i < k such that 
TT^{i) > k. That is, many indexes move from below position k to above position k. Let S 
be the set of indexes j such that k < vr'^(j) < k + M. We must have 

< k) - qU > k)) > 0, 

for otherwise the permutation vr^ where k is moved back to location k would score higher 
than TT™'. We spit S into 5*1, 5*2 and S3 as follows 

S = SiU S2U 83 = {j e S : J <k}U{j eS : k <j <k + L}U{j e S : J>k + L}. 

Note that 15*21 < L. Hence, by our assumption, 

J2^qU <k)- q{] > k)) = < A;) - q{j > k)) + < k) - q{j > k))+ 

jes jeSi jeS2 

^{q{] <k)-q{j >k)) <r-\Si\+r-\S2\-{r/'i)-\S-i\ < r ■ {T + L) - {r/3) ■ {M -T - L). 

Hence T + L - (M - T - L)/3 > 0, which implies that T > 7L. 

The fact that there are T indexes i < k such that 7r"' {i) > k, implies that there are at 
least T indexes i > k with 7r™'(i) < k. Denote 

Ti = {i<k: 7r™(i) >k}, T2 = {t>k: 7r™(i) < k}. 

Let be the permutation obtained from vr™ by concatenating its restriction to = 
{1, . . . , k — 1} with its restriction to Hji = {k, . . . , n}. We claim that vr™, scores higher than 
vr'", which is a contradiction. We first count the number of pairs {i < j) on which vr™ and 
TT^ disagree such that \i — j\ < L. To disagree, either i or j has to belong to Ti U T2, and in 
each case we have at most L choices for the other. Hence the total number of such pairs is 
at most 2TL. We denote these pairs by Pi. 

Next we count the number of pairs {i < j) on which vr™ and tt™ disagree such that 
K — j| > L. Note that for each such pair vr^ has the "right" answer and we know that in 
this case q{i < j) > (2/3)r. Each of the elements of Ti participates in such a pair with each 
element of T2, save at most L elements for which |i — j | < L. Thus the number of such pairs 
is at least T(T — L). We denote them by P2. 

The final difference in score between vr™ and tt^ is given by 

,(^-)_,(^-)= ^ < _ < ,)) + ^ {q{t < j) - q{3 < z)) > 

(j<i)ePi («<i)eP2 
(-r) ■ |Pi| + (r/3) ■ IP2I > (-r)(2TL) + {r/3){T^ - TL) = r{T^/3 - 7TL/3) > 0, 

since T > 7L. Contradiction. □ 
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It follows from Lemma [21] and Claim [19] that the pointwise distance between vr and vr™ 
is bounded hj k = 33L. We can now apply Lemma [TB] to obtain: 

Theorem [7] Let vri, . . . , vr^ be rankings on n elements independently generated by Mallow's 
model with parameter (3 > 0, and let a > 0. Then a maximum probability order vr™ can be 
computed in time 



except with probability < n In particular, the algorithm tends to almost linear as r 
grows. 

Remark. It should be noted that since the vTj's are actual orderings, they can be recovered 
with 0(n log n) queries of the type j k each. Thus the total query complexity is trivially 
bounded by 0{rn\ogn). 

4 Noisy comparisons aggregation 

4.1 The Discrepancy between the true order and optimal orders 

The goal of this section is to establish that with high probability any optimum solution will 
not be far from the original solution. We first establish that the orders are close on average, 
and then that they are pointwise close to each other. 

4.1.1 Average proximity 

We prove that with high probability, the total difference between the original and any optimal 
ordering is linear in the length of the interval. 

We begin by bounding the probability that a specific permutation a will beat the original 
ordering. Recall that dxla) is the number of pairs on which the permutation a disagrees 
with the identity. 

Lemma 22. Assume that the distributions of the scoring functions are strongly '-/-biased, 
and suppose that the original ordering is 1 < 2 . . . < n. Let a be another permutation. Then 
the probability that a beats the identity permutation is bounded from above by 



Proof. In order for a to beat the identity, it needs to beat it in the dx{(y) positions where 
they differ. The probability bound follows immediately from the definition of 7-biased dis- 



Recall that (i(r) = XliLi ~ "^1 total dislocation of elements under r. 

Lemma 23. The number of permutations r on [n] satisfying dij) < on is at most 




tributions. 



□ 



2" 2(i+c)"^^(V(i+c)) 
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Here H{x) is the binary entropy of x defined by 

H{x) = —X log2 X — {1 — x) log2(l — x) < — 2x log2 X, 

for small x. 

Proof. Note that each r can be uniquely specified by the values of s{i) = t{i) —i, and that we 
are given that ^ is exactly d^r) < cn. Thus there is an injection of r's with d^r) = m 
into sequences of n numbers which in absolute values add up to m. It thus suffices to bound 
the number of such sequences. The number of unsigned sequences equals the number of ways 
of placing m balls in n bins, which is equal to {"'~^^^^)- Signs multiply the possibilities by at 
most 2". Hence the total number of r's with (i(r) = m is bounded by 2" ■ ("^™]~^) • Summing 
up over the possible values of m we obtain 

^ 2^{n+cn) H{n/{n+cn)) 

□ 

Lemma 24. Suppose that the true ordering is 1 < . . . < 7i and n is large enough. Then if 
c > 1 and 

7c>4-(l + (l + c)/7(l/(l + c))), 

the probability that any ranking a is optimal and d{a) > cn is at most 2~^'"''^^^ for sufficiently 
large n. In particular, as ^ Q, it suffices to take 

c = 0(7-Mogl/7) =0(7"'). 

Proof. Let a be an ordering with d{a) > cn. Then by Claim [T5l we have dxio') > cn/2. 
Therefore the probability that such an ordering will beat the identity is bounded by 2"'^"'^/^ 
by Lemma [221 We now use union bound and Lemma [23] to obtain the desired result. □ 

4.1.2 Pointwise proximity 

In the previous section we have seen that it is unlikely that the average element in the optimal 
order is more than a constant number of positions away from its original location. Our next 
goal is to show that the maximum dislocation of an element is bounded by O(logn). As a 
first step, we show that one "big" dislocation is likely to entail many "big" dislocations. 

Lemma 25. Suppose that the true ordering o/l, . . . ,n is given by the identity ranking, that 
is, 1 < 2 . . . < n. Let 1 < i < j < n be two indices and m = j — i. Let Aij be the event that 
there is an optimum ordering a such that a{i) = j and the following two conditions hold: 

[i,j] C a[i - 2m, j + 2m], 

\{a[l,t -£-l]Ua[j+i + l, n]) n - 1]\ < i, 
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i.e., elements from at most 2m-away are mapped to by a, and at most i elements 
are mapped to the interval [i, j — 1] from outside the interval [i — i,j + i] by a. We set 
I = \_Cem\ < m, where 

= 300(1 -log7) = 

Then 

Proof. We prove the lemma by applying a union bound over all possible variants of the set 
B = a^^[i,j]. We know that B may contain a subset of size at most 3i of elements coming 
from — m, i — 1] U [j + 1, j + m], thus the number of possible sets is bounded by 

3iJ - 

The assumption that a is optimal implies in particular that moving the i-th element from 
the j-th position where it is mapped by a back to the i-th position does not improve the 
solution. For each specific choice of B, more than 2/3 of the elements that are mapped to 
[i,j — 1] are originally smaller than i, and hence the probability of moving the i-th element 
back not improving the solution is bounded by 2~"^'^. By union bound, 

P[Aij] < 2™^/^ ■ 2"™^ = 2~™'^/l 

□ 

As a corollary to Lemma [25] we obtain the following using a simple union-bound. For the 
rest of the proof all the log's are base 2. 

Corollary 26. Let 

mi = (- log£: + 2 log ra)/ (7/2) = 0((-log£: + logn)/7), 
then Aij does not occur for any i,j with \i — j\ > rrii with probability > 1 — e. 

Next, we formulate a corollary to Lemma [241 
Corollary 27. Suppose that 1 < 2 < . . . < n is the true ordering. Set 

m2 = 2mi. 

For each interval I = [i,...,j] with at least m2 elements consider all the sets Sj which 
contain the elements from 

r = [i + m2,...,j -ms], 
and are contained in the interval 

1+ = [i - m2, . . . ,j + m2]. 
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Then with probability > 1 — e all such sets Sj do not have an optimal ordering that has a 
total deviation from the true of more than C2 |i — j|, with 

C2 = - = 0(7-^), 

7 

a constant. 

Proof. There are at most ■ 2^*"^ such sets. The probabihty of each set not satisfying the 
conclusion is bounded by Lemma [Ml with 

2" C2m27/5 _ 2" _ 2~'"^2 . 2~2m2 _ 2~^™'2 ^ ^ . 2 _ 2~4™2 

The last inequality holds because m2 > max(logr2, — loge). By taking a union bound over 
all the sets we obtain the statement of the corollary. □ 

We are now ready to prove the main result on the pointwise distance between an optimal 
ordering and the original. 



Lemma 28. Assuming that the events from Corollaries l2E\ and\22\ hold, it follows that for 



each optimal ordering a and for each i, \i — 0'{i)\ < o^logn, where 



24 rrin 

C3 = — ■ = 0(7-^(-log£/logn + 1)) 
q log n 



is a constant. In particular, this conclusion holds with probability > 1 — 2e. 

Proof. We say that a position i is good if there is no index j such that a{j) is on the other 
side of i from j and \(y{j) — j\ > In other words, i is good if there is no "long" jump 
over i in a. In the case when i = j or i = a{j) for a long jump, it is not considered good. 
An index that is not good is bad. An interval / is bad if all of its indices are bad. Our goal 
is to show that there are no bad intervals of length > C3 log n. This would prove the lemma, 
since if there is an i with \i — > C3 logn then there is a bad interval of length at least 
cslogn. 

Assume, for contradiction, that / = [i, . . . ,i + 1 — 1] is a bad interval of length t > 
C3 logn, such that i — 1 and i + t are both good (or lie beyond the endpoints of [1, ... , n]). 
Denote by S the set of elements that is mapped to / by a. Denote the indices in S in their 
original order by < ^2 < • • • < ^i, i-e., we have: {a{ii), . . . , cr{it)} = I- 

By the goodness of the endpoints of / we have 

[i + m2, i + t - 1 - 1122] C {zi, . . . ,2f} C [i - 777-2, i + t -1 + m2]. 

Denote the permutation induced by a on 5 by a' so cr(zj) < a{ijr) is equivalent to cr'(j) < 
cr'(j'). The permutation a' is optimal, for otherwise it would have been possible to improve 
cr by improving a'. 
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By Corollary [27] and Claim [151 we have the following bound on the number of switches 
under a' (and hence the number of switches on the elements of S between themselves under 

dxio'') < d{a') < C2t. 

In how many switches can the elements of S participate under cr? They participate in 
switches with other elements of 5" to a total of dxicr')- In addition, they participate in 
switches with elements that are not in S. These elements must originate at the margins of 
the interval /: either in the interval [i — m2,i + m2] or the interval [i+t — 1—7712, i+t — 1+7712]. 
Thus, each contributes at most 2m2 switches with elements of S. There are at most 2m2 
such elements. Hence the total number of switches between elements in S and in S is at 
most 47712- Hence 

|(T(i) — i\ < ^{switches i participates in} < + 2dx{(y') < + 2c2t. (18) 

We assumed that the entire interval I is bad, hence for every position i there is an index 
ji such that |cr(jj) — ji\ > m2 and such that i is in the interval Jj = [j,, cr(jj)] (or the interval 
[(T(jj),ji], depending on the order). Consider all such Jj's. We will say that an interval J, 
is free if there is no interval Jj intersecting it such that \Jj\ > 2| Jj|. We will use a Vitali 
covering lemma argument to show that we can choose a disjoint collection of free intervals 
whose total length is at least |/|/5. 

Let be the collection of Jj's that are free. We claim that for every i E I there is an 
element Ji E T such that the "tripling" of Jj: Jf = [ji — | Jj|, cr(jj) + | Jj|] covers i. We know 
that there is an interval Ji that covers i. If Ji is free, then we are done. Otherwise, there is 
an interval J2 that intersects Ji and is at least twice as long. We continue this process until 
we reach an interval Jk that is free. How far can i be from the endpoints of J^? At most 

I Jfc-l| + I Jfc-2| + . . . + I Jl| < I Jfc|. 

Thus, the tripling of Jk covers i. 

The argument now proceeds as follows: Order the intervals in in a decreasing length 
order (break ties arbitrarily). Go through the list and add a Jj to our collection if it is disjoint 
from all the currently selected intervals. We obtain a collection Ji, . . . , J^ of disjoint intervals 
of the form [jj,o"(jj)]. Denote the length of the i-th interval by tj = |jj — o"(jj)| > 7712- Let 
Jf be the "quintupling" of the interval Jj: Jf = [ji — 2tj, o"(jj) + 2tj]. We claim that the Jf-s 
cover the entire interval /. Let m be a position on the interval /. Then there is an interval 
J in ^ such that its tripling J^ covers m. Choose the longest such interval J' = [j, cr{j)]. If 
J' has been selected to our collection then we are done. If not, it means that J' intersects 
a longer interval Jj that has been selected. This means that the tripling of J' is covered by 
the quintupled interval Jf. In particular, m is covered by Jf . We conclude that 

k k 

t = length(J) < length( Jf) = 5^tj. 

1=1 i=l 
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Thus Yl^i=i'^i — This concludes the covering argument. 

We now apply Corollary to the intervals Jj. Since every Jj is free, we conclude that 
on an interval Jj the contribution of the elements of S that are mapped to Jj to the sum of 
deviations under a is at least £f where ii = Cfti. Thus 



ie5 j=l j=l j=l 

> m2 ■ -f ■ C3 log 77, + ■ m2t > 7722 " (4?n2) + 2c2t = Ami + 2C2t, 
o 30 

for sufficiently large n. The result contradicts ( fTSl) above. Hence there are no bad intervals 
of length > C3 log n, which completes the proof. □ 



4.2 The algorithm 

We are now ready to give an algorithm for computing the optimal ordering with high prob- 
ability in polynomial time. Note that Lemma [28] holds for any interval of length < n (not 
just length exactly n). Set e = n~°'~^/4. Given an input, let S C {1, ... ,n} be a random 
set of size k. The probability that there is an optimal ordering a oi S and an index i such 
that \i — <j{i)\ > Cslogrz, where 

C3 = 0(7"'(-log£/logn + 1)) = 0(7-3(a + 1)), 

is bounded by 2e by Lemma [281 Let 

5*1 C 5*2 C . . . C Sn 

be a randomly selected chain of sets such that \Sk\ = k. Then the probability that an element 
of an optimal order of any of the SkS deviates from its original location by more than C3 logn 
is bounded by 2ne = n~"/2. We obtain: 

Lemma 29. Let Si G . . . G Sn be a chain of randomly chosen subsets with \Sk\ = k. Denote 
by (Tfc an optimal ordering on Sk- Then with probability > 1 — n~°'/2, for each and for 
each i, \i — ak{i)\ < c-slogn, where C3 = 0{j~^{a + 1)) is a constant. 

We are now ready to prove the main result. Theorem [TOl which we restate 

Theorem 30. There is an algorithm that runs in time n"^^, where 

C4 = 0(7"'(a + 1)) 

is a constant, that outputs an optimal ordering with probability > 1 — rz~". 
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Proof. First, we choose a random chain of sets S*! C . . . C 5'„, such that \Sk\ = k. Then 
by Lemma [221 with probabihty 1 — n~"/2, for each optimal order ak of Sk and for each i, 
\i — (Tk{i) I < C3 logn. We will find the orders iteratively until we reach (T„ which will be an 
optimal order for our problem. Denote {ak} = Sk — Sk-i- Suppose that we have computed 
(Tfc-i and we would like to compute ak- We first insert ak into a location that is close to its 
original location as follows. 

Recall that C3 = 6(7~^(a + l)) > (a + 3)/7. Break Sk into blocks Bi, B2, . . . , Bg of length 
c^logn. We claim that with probability > n~°'~^/2 we can pinpoint the block belongs to 
within an error of ±2, thus locating within 803 logn of its original location. 

Suppose that should belong to block _Bj. Then by our assumption on ak-i, ak is 
bigger than any element in Bi, . . . , Bi_2 and smaller than any element in -Bj+2, . . . ,Bs. By 
comparing ak to each element in the block and taking the sum of the comparison scores, we 
see that the probability of having an incorrect comparison result with a block Bj is bounded 
by ?7,~"~^/2. Hence the probability that ak will not be placed correctly up to an error of two 
blocks is bounded by n~°'~^/2 using union bound. 

Hence after inserting ak we obtain an ordering of Sk in which each element is at most 
3c3 log n positions away from its original location. Hence each element is at most 4c3 log n 
positions away from its optimal location in cr^. Thus, by Lemma [TBI we can obtain cr^ in time 
Q^^24c3+2y The process is then repeated. 

The probability of each stage failing is bounded by n~°'~^/2. Hence the probability of the 
algorithm failing assuming the chain Si G . . . G Sn satisfies Lemma [29] is bounded by n~"/2. 
Thus the algorithm runs in time 0(n^'^'^3+3-) _ ^o{-y •^(a+i)) g^j^^ j-^^^g failure probability of 
at most n-°/2 + n-"/2 = n"". □ 

4.3 Query Complexity 

In this section we outline the proof of Theorem [T21 Recall that the theorem states that 
although the running time of the algorithm is a polynomial of n whose degree depends 
on 7, the query complexity of a variant of the algorithm is 0(n logn). In this section 
we demonstrate that our algorithm can be implemented with high probability using only 
0(n,logn,) queries. Note that there are two types of queries in the algorithm. The first 
type is comparing elements in the dynamic programming, while the second is when inserting 
new elements. We will show that both parts require only 0(n,logn) queries. We start with 
queries in the dynamic programming part. 

Lemma 31. For all a > 0,7 < 1/2 there exists c(a,7) < 00 such that the total number 
of comparisons performed in the dynamic programming stage of the algorithm is at most 
cnlogn except with probability 0(n~°/4). 

Proof. Recall that in the dynamic programming stage, each element is compared with 
elements that are at current distance at most C3logn from it, where C3 = 03(0,7) = 
0(7-3(a + l)). 

Consider a random insertion order of the elements Let Sn/2 denote the set 

of elements inserted up to the n/2-th insertion. Then by standard concentration results it 
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follows that there exists 05(03, a) such that for all 1 < i < n — C5 logn it holds that 

\[ai,ai + C5\ogn]n Sn/2\ > c^logn, (19) 

and for all C5 log n < i < n it holds that 

I [ai - C5 log n, ai] H Sn/2 \ > C3 log n (20) 

except with probability at most n~°'~^. Note that when (fT9l) and ( !20l) both hold the number 
of different queries used in the dynamic programming while inserting the elements from 
{ai, . . . , a„} \ Sn/2 is at most 2c5nlogn, since none of these elements is ever compared to an 
element that is further than C5 logn away from it in the true order. 

Repeating the argument above for the insertions performed from Sn/A to Sn/2, from Sn/s 
to Sn/4: etc. we obtain that the total number of queries used is bounded by: 

2c5 log n{n + n/2 + . . . + 1) < ic^nlogn, 

except with probability < n~"/4. This concludes the proof. □ 

Next we show that there is implementation of insertion that requires only O(logn) com- 
parisons per insertion. To this end, we recall condition (jb]) from Definition [9] of strongly 
7-biased distributions. 



(b) There is a constant A such that for any A different T>af,,bk such that < hk holds for 
all the fc's, 

A 

J]g(afc<6fc)>0 >l-10^l (21) 
.fc=i 

Lemma 32. For all a > A>\ and 7 > there exists a 

C(A7,«) = 0((A +7-3)(a + l)) 

such that except with probability 0(n~"~^/2) it is possible to perform the insertion in the 
proof of Theorem {3^ so that each element is inserted using at most Clogn comparisons, 
O(logn) time and the element is placed a distance of at most 4c3logn from its optimal 
location, as required by the algorithm. 

Proof. Bellow we maintain the notation that 03(0,7) = 0(7^'^(a + 1)) is such that at all 
stages of the insertion and for each item, the distance between the location of the item in 
the original order and the optimal order is at most C3 log n. This will result in an error with 
probability at most n~°'/2. 

Let Cg = 0(a + 1) be chosen so that 



Bin{cQ log n, 0.99) < — log n + 2 log2 n 



< n 



~a-3 



(22) 
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Let Cy = Acq + 4C3. 

We now describe an insertion step. Let 5* denote a currently optimally sorted set. We 
will partition S into consecutive intervals of length between cj log n and 2c7 log n denoted 
We will use the notation for the sub-interval of li = [s,t] defined by /• = 
[s + 2c3logn, t — 2c3logn]. We say that a newly inserted element j belongs to one of the 
interval Jj if one of the two closest elements to it in the original order belongs to /j. Note 
that j can belong to at most two intervals. An element in 5* belongs to /j iff it is one of the 
elements in /j. Note furthermore that if j belongs to the interval /j then its optimal insertion 
location is determined up to 2{Acq + 6c3)logn. Similarly, if we know it belongs to one of 
two intervals then its optimal insertion location is determined up to 

Cglogn := 4:{Acq + 6C3) logn. 

Note that by the choice of C3 we may assume that all elements belonging to Jj are smaller 
than all elements of /j if i < j in the true order. Similarly, all elements belonging to Ij 
are larger than all elements of if j > i. We define formally the interval Jq = /q to be an 
interval of elements that are smaller than all the items and the interval It+i = to be an 
interval of elements that is bigger than all items. 

We construct a binary search tree on the set labeled by sub-intervals of such 
that the root is labeled by [1, t] and if a node is labeled by an interval [si, S2] with S2 — si > 1 
then its two children are labeled by [si, s'] and [s', S2], where s' is chosen so that the length 
of the two intervals is the same up to ±1. Note that the two sub- interval overlap at s' . This 
branching process terminates at intervals of the form [s, s + 1]. Each such node will have a 
path of descendants of length cq logn all labeled by [s, s -|- 1]. 

We use a variant of binary search described in Section 3 of |FPRU90] . The algorithm 
will run for celogn steps starting at the root of the tree. At each step the algorithm will 
proceed from a node of the tree to either one of the two children of the node or to the parent 
of that node. 

Suppose that the algorithm is at the node labeled by [si,S2] and S2 — si > 1. The 
algorithm will first take A elements from I'^^^i that have not been explored before and will 
check that the current item is greater than the majority of them. Similarly, it will make a 
comparison with A elements from /^j+i- If either test fails it would backtrack to the parent 
of the current node. Note that if the test fails then it is the case that the element does not 
belong to [si,S2] except with probability < O.OL 

Otherwise, let [si,s'] and [s', S2] denote the two children of [si,S2]. The algorithm will 
now perform a majority test against A elements from Is' according to which it would choose 
one of the two sub-intervals [si, s'] or [s', 52]. Note again that a correct sub- interval is chosen 
except with probability at most 0.01 (note that in this case there may be two "correct" 
intervals) . 

In the case where S2 = si + 1 we perform only the first test. If it fails we move to the 
parent of the node. It it succeeds, we move to the single child. Again, note that we will 
move toward the leaf if the interval is correct with probability at least 0.99. Similarly, we 
will move away from the leaf if the interval is incorrect with probability at least 0.99. 
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Overall, the analysis shows that at each step we move toward a leaf including the correct 
interval with probability at least 0.99. From fl22p it follows that with probability at least 
1 — n~"~^ after cglogn steps the label of the current node will be [s,s + 1] where the 
inserted element belongs to either Is or Ig+i- Thus the total number of queries is bounded 
by 3Ace log n. 

Now, once we have located the element within cg log n positions, we can refine the search 
by comparing the element to the relevant blocks Bj from the algorithm in Theorem [301 Thus 
will take at most Cglogn more queries, to a grand total of 

Cslogn + SAcQlogn = d{{A +-f-^){a + l)) 

queries to execute the insertion step of the algorithm. This concluded the proof. □ 
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